On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification

نویسندگان

  • Leo Wanner
  • Juan Soler
چکیده

The majority of approaches to author profiling and author identification focus mainly on lexical features, i.e., on the content of a text. We argue that syntactic dependency and discourse features play a significantly more prominent role than they were given in the past. We show that they achieve state-of-the-art performance in author and gender identification on a literary corpus while keeping the feature set small: the used feature set is composed of only 188 features and still outperforms the winner of the PAN 2014 shared task on author verification in the literary genre.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

The Impact of Recasts on the Syntactic Accuracy of Iranian EFL University Students’ Oral Discourse

Among the major issues raised by classroom SLA researchers is the debate on the degree to which teacher’s or learner’s attention should be directed to linguistic features. However, one of the relevant variables in corrective feedback studies which seem to be less operationalized is the differential impact of different types of feedback on the accuracy of the oral performance of the participants...

متن کامل

STANCE AND ENGAGEMENT DISCOURSE MARKERS IN JOURNAL’S “AUTHOR GUIDELINES”

Over the past decade, there has been an increasing interest in the study of interactional metadiscourse markers in different contexts. However, not much research has been conducted about the discourse of journal author guidelines, especially the use of meta-discourse markers in this genre. Therefore, this corpus-based study had three main aims: 1) to delve deep into the types, frequencies and f...

متن کامل

Iranian Advanced EFL Learners’ Awareness and the Use of Marked Word Order: Discourse-pragmatically Motivated Variations

The present investigation was designed to study the production and comprehension of specific means for information highlighted by advanced Iranian learners of English as a Foreign Language. The study focused on the discourse-pragmatically motivated variations of the basic word order such as inversion, pre-posing, it- and Wh-clefts. After taking the Nelson test, a homogeneous group was settled. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017